First hints of large scale structures in the ultra-high energy sky? 
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The result of the recent publication .1] of a broad maximum around 25 degrees in the two-point 
autocorrelation function of ultra-high energy cosmic ray arrival directions has been intriguingly 
interpreted as the first imprint of the large scale structures (LSS) of baryonic matter in the near 
universe. We analyze this suggestion in light of the clustering properties expected from the PSCz 
astronomical catalogue of LSS. The chance probability of the signal is consistent within 2 a with the 
predictions based on the catalogue. No evidence for a significant cross-correlation of the observed 
events with known overdensities in the LSS is found, which may be due to the role of the galactic 
and extragalactic magnetic fields, and is however consistent with the limited statistics. The larger 
statistics to be collected by the Pierre Auger Observatory is needed to answer definitely the question. 



PACS numbers: 98.70.Sa 



I. INTRODUCTION 



The origin of ultra-high energy cosmic rays (UHECRs) 
is still an open problem and, at the present, two different 
classes of models compete to explain the most energetic 
events observed. In "bottom-up" mechanisms the accel- 
eration up to extreme energy occurs in suitable astro- 
physical environments, whereas in "top-down" scenarios 
UHECRs are produced by the decay or annihilation of 
super-massive relic particles in the halo of our Galaxy 
or by cosmological diffuse topological defects. The ob- 
servation that UHECR arrival directions (in particular 
at energies E > 8 x I0 19 eV) may cluster according to 
the underlying large scale structure (LSS) of the universe 
would represent a clear evidence in favor of the "bottom- 
up" mechanisms, and should co-exist with the flux sup- 
pression known as the Greisen-Zatsepin-Kuzmin (GZK) 
effect @, The challenging and fascinating problem 
of determining at which energy (if any) astronomy with 
charged particles becomes possible is thus strictly related 
to the identification of the sources of UHECRs, which in 
turn would constrain the galactic and extragalactic mag- 
netic fields as well as the chemical composition of the pri- 
maries. The latter point is an important prerequisite to 
use UHECR data to study particle interactions at energy 
scales otherwise inaccessible to laboratory experiments. 

It is well known that the chances to perform cosmic 
rays astronomy increase significantly at extremely high 
energy, in particular due to the decreasing of deflec- 
tions in the galactic/extragalactic magnetic fields. More- 
over, at E > 4 — 5 x I0 19 eV the opacity of the in- 
terstellar space to protons drastically grows due to the 
photo-pion production p + 7cmb — > 7r°' + ) + p(n) on cos- 
mic microwave background (CMB) photons (GZK effect). 
A similar phenomenon at slightly different energies oc- 
curs for heavier primaries via photo-disintegration energy 
losses. Above this range of energy, most of the flux comes 
from sources within a distance of few hundred Mpc (see 
e -g- 0)i an d this should facilitate the source identifica- 
tion. Thus, the GZK feature in the spectrum and the 
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large-scale anisotropy should be correlated signatures. 
The drawback is that in the trans-GZK regime the flux 
is greatly suppressed, even beyond the expected power- 
law extrapolation of UHECR spectrum, and instruments 
with huge collecting areas are required to accumulate suf- 
ficient statistics to attack the problem. A final answer is 
expected when the Pierre Auger Observatory [j| 0] will 
have detected enough events. 

Until now, the experiments of the previous generation 
have collected 0(100) events above £>4-5x 10 19 eV, 
and one may wonder if any useful hint of the UHECR 
sources already hides in the available catalogues. In the 
recent publication [l| , the authors found some evidence of 
a broad maximum of the two-point autocorrelation func- 
tion of UHECR arrival directions around 25 degrees. The 
evidence was obtained combining the data with energies 
above 4 x 10 19 eV of several UHECR experiments, after 
an a priori adjustment of their energy scale. This sig- 
nal becomes significant only when several data-sets are 
added, but it is not caused solely by an incorrect com- 
bination of the exposure of different experiments. Both 
the signal itself and the exact value of the chance proba- 
bility have to be interpreted with care, since the authors 
did not fix a priori the search and cut criteria. Although 
the nominal value of the chance probability for the signal 
to arise from random fluctuations is around 0.01%, when 
taking into account a penalty factor of 30 they estimated 
the " true chance probability" of the signal to be of the 
order of P ~ 0.3%. The authors suggest that, given the 
energy dependence of the signal and its angular scale, it 
might be interpreted as a first signature of the large-scale 
structure of UHECR sources and of intervening magnetic 
fields. 

The aim of this work is to test their qualitative inter- 
pretation of the result on the light of the signal expected 
if UHECR data reflect the large scale structure distribu- 
tion of galaxies in the nearby universe. In Ref. Q, we 
have performed a forecast analysis for the Pierre Auger 
Observatory, to derive the minimum statistics needed to 
test the hypothesis that UHECRs trace the baryonic dis- 
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tribution in the universe. Assuming proton primaries, 
we found that a few hundred events at E > 5 x 10 19 eV 
are necessary at Auger to have reasonably high chances 
to identify the signature, independently of the details on 
the injection spectrum. In this work we calculate the 
expected signal in terms of the autocorrelation function 
as in [l[ for the presently available statistics, and discuss 
quantitatively how well predictions based on the LSS dis- 
tribution can reproduce their findings. The method and 
the results obtained are presented in Sec. [TTJ in Sec. ITTT1 
we briefly discuss our findings and conclude. 

II. UHECR CLUSTERING ON MEDIUM 
SCALES AND LSS 

In our analysis, we closely follow the approach re- 
ported in [l[ , using a similar dataset extracted from avail- 
able publications or talks of the AGASA M , Yakutsk 
d, SUGAR and HiRes collaborations [111, Q3 • In 
particular, in order to match the flux normalization of 
HiRes, the energies of the AGASA data must be rescaled 
downwards by ~ 30%, while the energies of Yakutsk and 
SUGAR data by ~ 50%. We address the reader to Q for 
further details. 

We define the (cumulative) autocorrelation function w 
as a function of the separation angle S as 

N i-1 
i=2 j=l 

where 6 is the step function, N the number of 
CRs considered and #jj = arccos(cos pi cos pj + 
sin pi sin pj cos(0i — 4>j)) is the angular distance between 
the two cosmic rays i and j with coordinates (ft, cf>) on the 
sphere. We perform a large number M ~ 10 5 of Monte 
Carlo simulations of N data sampled from an uniform 
distribution on the sky and for each realization j we cal- 
culate the autocorrelation function w 1 J so (S). The sets of 
random data match the number of data for the differ- 
ent experiments passing the cuts after rescaling, and are 
spatially distributed according to the exposures of the 
experiments. The formal probability P(S) to observe an 
equal or larger value of the autocorrelation function by 
chance is 

M 

P(S)=mJ2 @ H°( S )- w *( S )}> ( 2 ) 

3=1 

where w*(6) is the observed value for the cosmic ray 
dataset and the convention 0(0) = 1 is being used. Rela- 
tively high values of P and 1—P indicate that the data are 
consistent with the null hypothesis being used to generate 
the comparison samples, while low values of P or 1 — P 
indicate that the model is inappropriate to explain the 
data. Note also that by construction the values at differ- 
ent 8 of the function P{8) are not independent. Nonethe- 
less, studying the cumulative distribution function (as 
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FIG. 1: The solid line shows the chance probability P(8) to 
observe an equal or larger value of the autocorrelation func- 
tion as function of the angular scale S for the combination 
of experimental data of Hires+ AGASA+Yakutsk+SUGAR as 
described in the text. The dashed purple line is the same sig- 
nal, when cosmic rays falling in the PSCz catalogue mask are 
disregarded. The dot-dashed green line is the same quantity, 
if the random events are sampled according to the LSS dis- 
tribution, instead of an uniform one. The dotted green line is 
the result for a sample proportional to the square of the LSS 
distribution. 



opposed to the differential one) is the only realistic way to 
extract information in a low statistics "noisy" sample. In 
addition, an autocorrelation study — differently from the 
approach of Ref. [7[ where a x 2 -analysis was used — only 
relies on the clustering probability in the data, while any 
directionality in the signal is lost. Although providing 
less compelling evidence, this method has the advantage 
of being more robust towards large magnetic deflections. 
As long as the energy and the charge of primaries from 
the same source are similar, their relative displacement 
should be small compared with the absolute displacement 
with respect to their sources. Thus it is natural to ex- 
pect that the first (although more ambiguous) hints of a 
signal may come from the study of w(5). 

Figure [T] summarizes our main results. The solid, black 
curve shows that under the same assumptions of Ref. [l[ , 
we obtain the same behavior for the function P(S) (com- 
pare with their Fig. 5). To proceed further, we have to 
compare the previous signature with the one expected 
from a model of the LSS. As in Q, we use the IRAS 
PSCz galaxy catalogue [HI]. We address to our previous 
work p| as well as to the original paper [l3[ for technical 
details about the catalogue and about the calculation of 
the UHECR sky map — which takes into account energy 
losses as well — that we use in the following. It is impor- 
tant reminding that the catalogue suffers of an incom- 
plete sky coverage. This includes a zone centered on the 
galactic plane and caused by the galactic extinction and a 
few, narrow stripes which were not observed with enough 
sensitivity by the IRAS satellite. These regions are ex- 
cluded from our analysis with the use of the binary mask 
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available with the PSCz catalogue itself. This reduces 
the available sample (by about 10%) to 93 events and 
the nominal chance probability to 0.1% (Fig. [1] dashed- 
purple line). Note that this is a quality factor of the 
catalogue, not an intrinsic problem of the data or theo- 
retical prediction. The green/dot-dashed line in Fig. [T] 
shows the chance probability of the signature found in [l[ , 
if the random events are sampled according to the LSS 
distribution (obviously convolved with the experimental 
exposures), rather than from an uniform one. Finally, the 
dotted line shows the same result if the random events 
are sampled according to the square of the LSS distri- 
bution, as one would expect e.g. for a strongly biased 
population of sources. 

The prominent minimum of [l[ is greatly reduced when 
using as null hypothesis the LSS model instead of the 
uniform one; this effect is even more prominent in the 
quadratic map. Also, the data are less clustered than ex- 
pected from an uniform distribution at S ~ 160°, where 
P ~ 1. This additional puzzling feature disappears when 
using the LSS null hypothesis, as it appears clearly in 
Fig. [2l where we plot the function P(S) x [1 — P(5)] for 
the same cases of Fig. [T] This function vanishes if any 
of P or 1 — P vanishes and has the theorethical maxi- 
mum value of 1/4. Thus, the higher its value is the more 
consistent the data are with the underlying hypothesis. 
Apart from the very small scales, where our results are 
unrealistic since we did not include magnetic smearing or 
detector angular resolution, the better concordance of the 
UHECR distribution with the LSS distribution than with 
the uniform one is evident at any scale. Taken at face 
value, our result implies a nominal probability P > 5% 
that the main signature found in Ref. [l[ arises as a 
chance fluctuation from the LSS distribution. This sug- 
gests that the clustering properties of LSS are in much 
better agreement with the experimental data than a pure 
isotropic distribution. This is not an unexpected feature 
given that, as found in 7|, the typical size on the sky of 
the clusters of structures lie in the range 15°-30°. 

The absolute scale of the curves shown in Fig. Q] is 
affected by an uncertainty due to the true energy scale: 
we calculated the map assuming that the HiRes energy 
scale is the correct one, in agreement with Berezinsky et 
al.'s fit of the dip due to pair production of protons on 
CMB 14] . But if the true energy is higher, as a compro- 
mise solution with the other experiments may require, 
the chance probability is slightly higher. In this respect, 
one may look at our result as a conservative one. Hence, 
the largest sample of [l| which we chose on the basis of the 
strongest signal is consistent within "2cr" with the clus- 
tering properties expected from LSS distribution 1 . Also 
given the fact that the "true probability" is higher than 



1 Consistent within "2ct" means here at least in 5% of the cases. 
The distribution is indeed far from gaussian, and the number of 
a can be used in its loose sense only. 



the nominal one (due to the penalty factor of the search 
a posteriori performed in [l[), this may be considered as 
an argument in favor of their interpretation. 
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FIG. 2: The function P(S) x [1 - P(5)\ for the same cases 
shown in Fig. [I] See text for details. 

In order to understand why their result is consistent 
with our sampled catalogue, it is useful to look at the 
chance probability of the autocorrelation signal of events 
sampled from the LSS according to the experimental ex- 
posures, assuming linear correlation. We show this func- 
tion in Fig. [3] for two samples of N =93 and N =279 
data, respectively the same statistics of the dashed and 
dot-dashed curves in Fig. [IJ and a factor 3 higher. The 
curves are obtained as follows: a large number M (~ 10 4 ) 
Monte Carlo realization of N events is sampled according 
to the LSS probability distribution, and for each realiza- 
tion i we calculate the function w ljSS (5). We generate 
analogously M random datasets from an uniform distri- 
bution, and calculate w 1 J so (S). We have thus M 2 inde- 
pendent couples of functions The fraction of the 
M 2 simulations where the condition wj° (5) > wf ss (5) is 
fulfilled is the probability 

M M 

i=i j=i 

which is the function shown in Fig. [3l 

An important qualitative feature is that the shape of 
the curve presents indeed a broad minimum at scales of 
S < 30°, and a moderate plateau at scales of 70° < 6 < 
130°. As shown by the N = 279 case, in particular the 
first feature is intrinsic to the data: the more data are 
sampled, the more enhanced it appears. This is also 
the trend shown in [l[ when enlarging the experimen- 
tal statistics considered. Also, the higher the energy cut 
in the map, the stronger the signature, since the local 
structures are more and more prominent. Finally, the 
LSS data samples are typically less clustered than the 
uniform ones at S > 150° (Plss > 0.5). 

On the other hand, the minimum found in Fig. [3] for 
the sample of 93 data is much less prominent than the 
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FIG. 3: Chance probability Plss(8) to observe a larger value 
of the autocorrelation function as function of the angular scale 
5 for two samples of 93 and 279 data according to the LSS 
map cutted at 4 x 10 19 eV. The three lower curves for the 
N — 279 case show the signature when the map cutted at 
E > 2, 4, 5 x 10 19 eV — from top to bottom — is used, while the 
upper solid curve refers to the case N = 93, E > 4 x 10 19 eV. 



one shown by the dashed curve in Fig. [T] This explains 
why the consistency is "only" at the level of ~ 5%. This 
fact is not unexpected, given the predictions of Ref. 0: 
^100 events are too few to guarantee a detection of the 
imprint of the LSS with a high significance. Consistently 
with the results of Fig. [TJ we checked that the dip in Fig. 
131 becomes more pronounced if we use a quadratic bias 
with LSS. Although not statistically significant at the 
moment, a confirmation of a highly clustered signal at 
intermediate scales may suggest thus a more than linear 
correlation of UHECR sources with the galaxy density 
field. Alternatively, the signal may be due to the mag- 
netic smearing of a few relatively strong point-sources. 

Clearly, a smoking gun in favor of the LSS-distribution 
would be a correlation between the data and the expected 
excess in the LSS map. By performing an analysis similar 
to the previous one, but in terms of the cross-correlation 
function between simulated data and sampled ones, we 
did not find any evidence favoring a LSS origin with re- 
spect to the uniform case. Actually this is not unexpected 
within the model considered in p}, since ~ 100 data at 
energy >4x 10 19 eV is still a too low statistics to draw 
a firm conclusion in this sense. However, the lack of this 
signature may also be related to the role of intervening 
magnetic fields. Acting on an energetically (and possi- 
bly chemically) inhomogeneous sample, magnetic fields 
may displace the observed positions with respect to the 
original ones in a non trivial way, without evidence for a 
characteristic scale, at least in a poor statistics regime. 
A possible hint towards a non-negligible role of magnetic 
fields is given also by the fact that the dip in the LSS 
signal is already present at relatively small angles. This 
feature may have disappeared in the UHECR sample due 
to a smearing effect of the magnetic fields. 



III. DISCUSSION AND CONCLUSIONS 

We have analyzed the hypothesis that the broad max- 
imum of the two-point autocorrelation function of the 
UHECRs arrival directions around 25° found in Ref. [l[ 
may be due to the imprint of the LSS. We have concluded 
that this suggestion is at least partially supported by the 
UHECR sky map constructed starting from a LSS cata- 
logue. Even their nominal (non penalty factor-corrected) 
result for the autocorrelation function is consistent within 
2 a with our expectations. A stronger correlation with 
source luminosity or a more-than-linear bias with over- 
density may improve the agreement. Also, the correla- 
tion may not be directly with the LSS themselves: any 
class of sources which is numerous enough is expected 
to show some indication in favor of this correlation. The 
low statistics and the role of the magnetic field deflections 
may explain why no significant cross-correlation between 
data and LSS overdensities is found. 

The authors of Ref. [l[ also claim that the if the sig- 
nal found is real, a heavy composition of the UHECRs 
is disfavored. However, we note that a heavy or mixed 
composition of the UHECRs may well be consistent with 
the signature. If we limit to the role of the (relatively well 
known) galactic magnetic field, a naive extrapolation of 
the simulations performed in [15|] would indicate in the 
linear regime deflections for iron nuclei of about 130° 
with respect to the incoming direction. UHE iron nuclei 
would then be in a transition from diffusive to ballistic 
regime. Nonetheless, the signal is sensitive to the relative 
deflections of "bunches" of cosmic rays originating from 
a similar region of the extragalactic sky, for which typi- 
cal models of the regular galactic magnetic field predict 
a smearing < 40° even for iron nuclei, as long as their en- 
ergies do not differ by more than about 30%. One may 
even speculate that the second dip at large angles (arising 
from cross-correlation of different groups of overdensities) 
might originate from primaries of different rigidity com- 
ing from the same few sources, splitted apart by interven- 
ing fields. Also, the consideration in [l| that accounting 
for the medium scale structure in UHECRs may change 
the significance of claims of small-scale clustering should 
be carefully examined. If the picture emerging from Ref. 
[l[ and this paper is consistent, both LSS and magnetic 
fields play a role in shaping the signal, otherwise it is hard 
to explain the lack of cross-correlation with known over- 
densities of LSS. "Filaments and voids" in the observed 
data do not match the position of filaments and voids 
in the LSS. But if they are nonetheless connected, this 
difference must be rigidity-dependent. Thus, a cluster of 
events with high rigidity may well arise in a void of the 
presently known UHECR filamentary structure, which 
may sit closer to an overdensity of the LSS. Indeed, the 
clustered component of the AGASA data favoring small- 
scale clustering does show a different energy spectrum 
than the non-clustered component. This discussion em- 
phasizes that, unfortunately, it is virtually impossible to 
draw strong conclusions at present, even assuming that 
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the clustering at intermediate scales is physical. 

In conclusion, the analysis performed in this paper 
does not exclude that the signal found in [l[ may be due 
to the imprint of the LSS, an indeed gives some support 
in this sense. Definitely, the larger statistics that the 
Auger Observatory is going to collect in the next years 
is needed to tell us finally if astronomy is possible with 
UHECRs or, equivalently, if we will be ever able to look 



at the sky with new and "ultra-energetic" eyes. 
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